home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
tsql
/
doc
/
tsql.mail
/
000052_csj@iesd.auc.dk _Mon Mar 29 16:26:56 1993.msg
< prev
next >
Wrap
Internet Message Format
|
1996-01-31
|
15KB
Received: from iesd.auc.dk by optima.cs.arizona.edu (5.65c/15) via SMTP
id AA21457; Mon, 29 Mar 1993 07:27:01 MST
Received: from yellow.iesd.auc.dk by iesd.auc.dk with SMTP id AA22405
(5.65c8/IDA-1.5/MD for <tsql@cs.arizona.edu>); Mon, 29 Mar 1993 16:26:56 +0200
Date: Mon, 29 Mar 1993 16:26:56 +0200
From: "Christian S. Jensen" <csj@iesd.auc.dk>
Message-Id: <199303291426.AA22405@iesd.auc.dk>
To: tsql@cs.arizona.edu
Subject: Revised db schema proposal
Dear colleague,
I have now prepared a new proposal for the database schema for the
TSQL Benchmark. I hope it takes into account the recent comments by
various contributors and helps ensure progress. The proposal is
appended at the end of this message. Below, I have included various
comments related to the proposal.
Let me summarize the rationale behind the original straw proposal for
the database schema. It was based on two observations. First, making a
comprehensive benchmark of temporal database queries is a large and
complex task, particularly when there are multiple contributors.
Second, the goal of this effort is to produce a benchmark in time for
the TDB workshop in mid-June. Consequently, a versioned approach was
considered essential. In response, a number of restrictions were
imposed on which types of queries should be considered initially, and
an attempt was made to use a relatively simple, initial database
schema.
Several contributors have requested that more kinds of queries be
included and that the database schema be extended with various
attributes. I have tried to meet those demands while still maintaining
a versioned approach (please refer to the appended document when
reading the following).
The schema for Emp has been enlarged and now contains the attributes
Name, Salary, Dept, Gender, B-day, and Skills.
Note that a candidate Bloodtype has been proposed as a substitute for
Gender. On the other hand, Ed and Patrick argue that Gender is a
frequently used attribute that allows one to make many meaningful
queries. Note also that the prescence of both Salary and Dept
introduces "functional redundancy." Thus one of these may be removed,
or Skills may be removed if Dept is changed to Depts (and becomes
multivalued). Birthday is the only user-defined time attribute. Some
functional redundancy is introduced as two time-invariant attributes
(Birthday and Gender) coexist. A minimalized schema with no functional
redundancy will contain the attributes Name, Dept, B-day, and Skills.
I have not included an attribute such as Temperature. It is hard to
find an good place for it, and including it seems to be contrary to
the focus of most temporal data models. Many models cannot store such
information, so not much may be said about how well they express
related queries.
The question was raised whether there are examples of user-defined
time attributes which are "multivalued." If we add an attribute
"Birthdays-of-children" (the company sends presents to children of
employees) to the Emp schema then Name -->-> Birthdays-of-children.
Note that the new, expanded schema is not even in 3NF. That is the
price paid for introducing the Skills attribute.
I feel that most progress will be made if we propose complete database
schemas from now on and motivate the schemas. For example, one could
propose that an attribute is removed from the schema in the document
below, for some good reason. In any case, we need to settle on a
schema fairly quickly, if we are going to make our June deadline for
the entire benchmark.
Regarding the restrictions on the types of queries to be considered, I
propose a cautious approach where we lift restrictions (candidates
have already been indicated by contributors) at a later point, if (or
when) it becomes clear that too few queries (e.g., less than 100) fall
within the scope.
It has also been pointed out that some of the restrictions may not be
very operational. I agree with that observation. Such useless
restrictions will be removed when it becomes completely clear that
they are indeed useless.
Best regards,
Christian
\documentstyle[11pt]{article}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This paper is intended to evolve into the first version of the TSQL
% benchmark.
% The purpose of this draft is to settle on a database schema for the
% benchmark.
% The purpose of the next draft is to settle on an instance for the
% agreed-upon database schema.
% The purpose of the following draft is then to define a taxonomy to
% be used for categorizing the benchmark queries that will follow.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\addtolength{\textwidth}{1.485in}
\setlength{\oddsidemargin}{.1in}
\setlength{\evensidemargin}{.1in}
\addtolength{\topmargin}{-.85in}
\addtolength{\textheight}{1.8in}
\newenvironment{prog} { \begin{center} \begin{minipage}{3in}
\begin{tabbing} nnnn\=nnnn\=nnnn\=nnnn\=nnnn\=nnnn\=nnnn\=\kill
}{\end{tabbing} \end{minipage} \end{center}}
\long\def\comment#1{}
\newcommand{\mvd}{\mbox{$\rightarrow \!\!\! \rightarrow \;$}}
\newcommand{\autsp}{$\;\;\;$}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% PAPER START
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\title{\Large\bf The TSQL Benchmark \\ DRAFT}
\author{James Clifford \autsp Shashi K.~Gadia \autsp Fabio Grandi
\autsp Christian S.~Jensen \\ Patrick Kalua \autsp Edward
Robinson \autsp John F.~Roddick \\ Maria Rita Scalas \autsp
Richard T.~Snodgrass \autsp Abdullah Tansel \autsp Paolo Tiberio}
%Note that the list of authors is preliminary and may not be accurate!
\date{March 29, 1993}
\maketitle
\section{Introduction}
\subsection{Goal}
The central goal of this document is to provide the temporal database
community with a {\em comprehensive consensus benchmark} for temporal
query languages that is {\em independent} of any existing language
proposal.
This is not a performance benchmark, but is rather a {\em semantic}
benchmark intended to be an aid in evaluating the user-friendliness of
proposals for temporal query languages. Thus, temporal query languages
should ideally be able to express the benchmark queries both
conveniently and naturally.
To obtain a consensus benchmark, researchers in temporal databases
have been invited to participate in this initiative, and each researcher
that has contributed significantly will be a coauthor. The electronic
mail distribution {\tt tsql@cs.arizona.edu} is used as the medium for
discussing the benchmark and related issues.
As a consequence of the central goal above, no existing temporal data
models are used or mentioned. The relation schemas of the benchmark
are expressed as sets of attributes, with the temporal aspects being
implicit (of course, specific temporal data models might add explicit
temporal attributes). The contents of the relations are described in
natural language. The benchmark queries are also given only in natural
language.
The benchmark is {\em not} intended to constitute a metric for query
language completeness, and as such it is not a substitute for a
rigorous {\em theoretical} study of expressive powers of various
temporal query languages. Comprehensiveness of the benchmark is
desirable only to ensure that all aspects of query language design are
covered.
It it emphasized that using the benchmark as an advanced, quantitative
scoring system for comparing languages makes little sense. Thus, one
language is not necessarily superior to another just because one is
capable of expressing more benchmark queries than the other. Rather,
the focus is on user-friendliness.
\subsection{Scope}
Within certain boundaries, discussed next, the benchmark is intended
to contain all queries that appear reasonable and that are consistent
with the schema and data of the benchmark.
First, the benchmark is of a semantic nature---in its current form, it
is not aimed at performance comparisons. The intention is to provide a
foundation for comparing the descriptive and operational
characteristics and capabilities of temporal query languages, not
their performance characteristics. Properly extended with additional
relation schemas and a variety of large instances, the benchmark can
also be used for performance comparisons.
Second, a number of restrictions are imposed on which types of queries
are admissible in this version of the benchmark, including the
following.
\begin{itemize}
\item Queries are restricted to valid time only. Transaction-time
related queries are not explored.
\comment{
\item User-defined time, including the interaction between
user-defined time and valid time, is not considered.}
\item Schema evolution and versioning are not considered.
\item Incompleteness is not considered.
\item Recursive queries are not included.
\item Nested queries are not included.
\item General temporal reasoning is beyond the scope of this version
of the benchmark.
\item For simplicity, each relation is used only once in each query.
\item Queries involving aggregation facilities are not considered.
\comment{
\item Queries involving relations with multivalued dependencies (in
the snapshot sense) are not explored.}
\item Only queries are included---updates are not considered.
\end{itemize}
These advanced aspects are excluded solely for pragmatic reasons, and
the exclusion is not meant to imply in any way that the aspects are
not important. The restrictions simply represent an attempt to reduce
the size of the initial benchmark to manageable proportions.
It is emphasized that this benchmark is merely the first in a sequence
of ever-more comprehensive benchmarks. Later benchmarks will relax the
above restrictions on the scope of comprehensiveness imposed on this
benchmark.
\comment{
Specifically, the next version of the benchmark is intended to include
queries that use the same relation more than once, utilize
aggregation, and involve both valid time and user-defined time.}
\section{The Benchmark Database Schema}
\subsection{Criteria}
A suitable database schema for the benchmark should satisfy three
criteria.
\begin{itemize}
\item{} The schema should be simple. This will aid in making the
benchmark easy to understand. This criterion restricts the number of
relation schemas and the number of attributes of the individual
schemas. Additionally, the names of the relations and of the
attributes should be short, as they will be referenced repeatedly.
When an extension is proposed, the benefits should be carefully
compared with the added complexity.
\item{} The schema should allow for comprehensiveness within the
chosen scope. Using the schema, it should be possible formulate
queries of all the types that appear reasonable.
This indicates a need for at least two related relation schemas (for
natural join queries).
\item{} A schema that has already been used frequently is preferred
over a new schema. This guarantees that many existing queries can be
adapted easily to the benchmark.
\end{itemize}
\section{The Benchmark Database Schema}
\subsection{Criteria}
A suitable database schema for the benchmark should satisfy four
criteria.
\begin{itemize}
\item{} The schema should be natural. That is, it should correspond to
a reasonable, though possibly greatly simplified, segment of the
real world. This both reduces the need to explain the model and
enhances the ability to recognize verball pitfalls in the path to
the query instances.
\item{} The schema should be simple. This will aid in making the
benchmark easy to understand. This criterion restricts the number of
relation schemas and the number of attributes of the individual
schemas. Additionally, the names of the relations and of the
attributes should be short, as they will be referenced repeatedly.
When an extension is proposed, the benefits should be carefully
compared with the added complexity.
\item{} The schema should allow for comprehensiveness within the
chosen scope. Using the schema, it should be possible formulate
queries of all the types that appear reasonable.
This indicates a need for at least two related relation schemas (for
natural join queries).
\item{} A schema that has already been used frequently is preferred
over a new schema. This guarantees that many existing queries can be
adapted easily to the benchmark.
\end{itemize}
\subsection{The Schema}
The database schema consists of two valid-time relation schemas, {\tt
Emp} and {\tt Mgr}. In the terminology of the entity-relationship
model, the first schema models an entity set, and the second a
relationship set. They are defined as follows.
Relation {\tt Emp} uses the attributes {\tt Name}, {\tt Salary}, and
{\tt Dept} for recording the relationships between employees,
salaries, and departments. In addition, it contains attributes {\tt
Gender}, {\tt B-day}, and {\tt Skills} which indicate the gender,
birthday, and skills of an employee.
Relation {\tt Mgr} records the association of employees, as managers,
with departments, and it contains two attributes, {\tt Dept} and {\tt
Manager}.
Attributes {\tt Name}, {\tt Dept}, {\tt Skills}, and {\tt Manager} are
of type {\tt textString}; attribute {\tt Gender} is one of {\tt F}
(female) and {\tt M} (male); {\tt Salary} is of type {\tt integer};
and {\tt Birthday} is a user-defined time value which may be compared
with valid times.
The relation schemas obey the following {\em snapshot} functional
and multivalued dependencies:
\begin{prog}
For {\tt Emp}: \\
\> {\tt Name} $\rightarrow$ {\tt Salary} \\
\> {\tt Name} $\rightarrow$ {\tt Dept} \\
\> {\tt Name} $\rightarrow$ {\tt Gender} \\
\> {\tt Name} $\rightarrow$ {\tt B-day} \\
\> {\tt Name} $\mvd$ {\tt Skills} (and {\tt Name} $\not\rightarrow$
{\tt Skills}) \\
For {\tt Mgr}: \\
\> {\tt Dept} $\rightarrow$ {\tt Manager}
\end{prog}
Note that {\tt Name} and {\tt Skills} is the primary key of {\tt Emp}
(it is the only candidate key). The schema is not in third normal form
as, e.g., {\tt Name} $\rightarrow$ {\tt Salary} is a non-trivial
functional dependency where {\tt Name} is not a superkey for {\tt Emp}
and {\tt Salary} is not part of a candidate key for {\tt Emp}.
Attributes {\tt Gender} and {\tt Birthday} are both time-invariant.
Schema {\tt Mgr} is in snapshot Boyce-Codd normal form.
The attribute {\tt Manager} of {\tt Mgr} is a foreign key for the
attribute {\tt Name} of {\tt Emp}. Thus, a tuple is allowed to exist
in the {\tt Mgr} relation only if, for each non-empty snapshots of
this tuple, the {\tt Manager} attribute value exists as a {\tt Name}
value of some tuple in the simultaneous snapshot of the {\tt Emp}
relation.
\end{document}